feat(agents): add PR Walkthrough narrative orientation agent by dfinson · Pull Request #1947 · microsoft/hve-core

dfinson · 2026-06-14T13:04:43Z

Summary

Adds a PR Walkthrough agent that produces narrative-driven PR orientations. After reading the output, a reviewer understands what changed, why, how the pieces connect, which files carry architectural weight, and where human judgment is required.

This is not a findings tool — it builds the reviewer's mental model so they can review efficiently and notice what matters.

Motivation

As agent-generated code becomes the norm, PRs are growing larger (10–50+ files) and the bottleneck has shifted from writing code to reviewing it. A narrative walkthrough — structured like a tech blog rather than a robotic file list — makes large diffs tractable by establishing a mental model before the reviewer opens the diff.

This agent distills ~2 months of personal experimentation with 'review as narrative' into a generalizable flow that works for PRs of any size.

What's included

File	Purpose
\.github/agents/hve-core/pr-walkthrough.agent.md\	The agent definition
\collections/hve-core.collection.yml\	Registration in hve-core collection
\collections/hve-core-all.collection.yml\	Registration in hve-core-all collection
\plugins/\ (generated)	Regenerated plugin outputs

Modes

Standalone: invoke directly with a base branch comparison
Orchestrated: reads \diff-state.json\ when called as a subagent of PR Review

Key design decisions

Follows the idea of the change, not the file list
Every claim anchored to quoted code fragments
Proportional output (small PRs get brief treatment)
Surfaces design forks and implicit bets for human judgment without prescribing answers
Mandatory contextual research step with self-verification gate

Testing

Tested across 10 PRs of varying sizes (3 lines to 1074 lines) across hve-core, VS Code, and TypeScript repos. See example outputs in issue #1946.

Relates to #1946

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new “PR Walkthrough” agent to the HVE Core ecosystem and wires it into plugin packaging and collection indexes, alongside broad markdown table reformatting in collection docs.

Changes:

Added .github/agents/hve-core/pr-walkthrough.agent.md and registered it in hve-core / hve-core-all collections.
Added plugin agent pointer files for pr-walkthrough in both hve-core and hve-core-all.
Reformatted agent/prompt/instruction/skill tables across multiple collection markdown files (likely to improve rendering/consistency).

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
plugins/hve-core/agents/hve-core/pr-walkthrough.md	Adds plugin-level pointer to the central `pr-walkthrough` agent definition.
plugins/hve-core/README.md	Documents the new `pr-walkthrough` agent in plugin README tables.
plugins/hve-core-all/agents/hve-core/pr-walkthrough.md	Adds plugin-level pointer to the central `pr-walkthrough` agent definition.
plugins/hve-core-all/README.md	Documents the new `pr-walkthrough` agent in plugin README tables.
collections/security.collection.md	Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/project-planning.collection.md	Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/jira.collection.md	Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/installer.collection.md	Reformats auto-generated tables (instructions/skills).
collections/hve-core.collection.yml	Registers the new PR Walkthrough agent in the core collection.
collections/hve-core.collection.md	Adds `pr-walkthrough` to the core collection markdown listing + table reformat.
collections/hve-core-all.collection.yml	Registers the new PR Walkthrough agent in the “all” collection.
collections/hve-core-all.collection.md	Adds `pr-walkthrough` to the “all” collection markdown listing + table reformat.
collections/gitlab.collection.md	Reformats auto-generated tables (instructions/skills).
collections/github.collection.md	Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/experimental.collection.md	Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/design-thinking.collection.md	Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/data-science.collection.md	Reformats auto-generated tables (agents/prompts/instructions).
collections/coding-standards.collection.md	Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/ado.collection.md	Reformats auto-generated tables (agents/prompts/instructions/skills).
.github/agents/hve-core/pr-walkthrough.agent.md	Introduces the PR Walkthrough agent’s full instruction set and workflow.

Comments suppressed due to low confidence (1)

.github/agents/hve-core/pr-walkthrough.agent.md:1

This line contains an em dash character (—) while also stating they are banned. If the repository-style rule is meant to apply to authored markdown artifacts as well (not just the agent’s generated output), this file violates it. Consider replacing the literal em dash character with a textual description (e.g., “em dash”) to avoid introducing the banned glyph into the repo.

---

codecov-commenter · 2026-06-14T16:16:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.74%. Comparing base (a847cfa) to head (076df9c).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1947      +/-   ##
==========================================
- Coverage   80.82%   80.74%   -0.08%     
==========================================
  Files         117      127      +10     
  Lines       19095    19176      +81     
  Branches        0       12      +12     
==========================================
+ Hits        15433    15484      +51     
- Misses       3662     3689      +27     
- Partials        0        3       +3

Flag	Coverage Δ
docusaurus	`61.84% <ø> (?)`
pester	`84.64% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 11 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jkim323 · 2026-06-15T03:52:55Z

I love how this pr-walkthrough agent focuses on building the reviewer’s mental model. That output feels genuinely valuable.

One suggestion: thoughts on consider modeling it as a subagent of pr-review rather than as a new top-level peer agent. To me, the strongest value here is orientation before inspection: helping the reviewer understand the shape of the diff, triage important files, and surface design forks before pr-review produces findings.

Keeping it under pr-review would also make the product boundary clearer: pr-review remains the user-facing coordinator for review findings and merge readiness, while pr-walkthrough provides the narrative orientation artifact. It could also reuse the existing diff/CI/tracking pipeline instead of duplicating that setup.

If we go this route, I’d suggest marking the subagent with maturity: experimental in the collection manifest since this capability is still new and being validated.

With this said, I would like to invite @agreaves-ms and @WilliamBerryiii for other perspectives regarding this thought! Thank you!

dfinson · 2026-06-15T08:11:49Z

Thanks for the feedback @jkim323. I tested this and want to share some context on this exact design tension I've been wrestling with.

I ran the walkthrough against 9 merged PRs and fed the output to a model acting as pr-review for focus-zone extraction. 9/9 high-confidence extractions, so the subagent model works mechanically. But there's a philosophical problem. This agent exists in its current form (i.e. 385 lines of attitude - professional, focused attitude, but still designed as a strong personality) because without the opinionated voice and strong point of view, the model immediately regresses to gluing English prose between diff hunks, using the hunks themselves as narrative scaffolding. IMO that kind of output isn't worth very much because it doesn't capture human attention, doesn't abstract by ideas, and doesn't structure around decisions. That style makes more sense if the agent is passing judgement (which pr review already does), less so if it needs to act as a lens for human attention. The personality itself seems to be what forces architectural thinking rather than line-by-line summarization.

Which means this thing embodies a fundamentally different philosophy than classical AI code review: PR Review scans the diff and formulates its own judgments, while this agent explicitly bans itself from judgment. Its job is to focus the human on what needs their judgment, not to replace it. It's for the person staring down a 45-file diff who needs orientation before they can effectively form their own opinions.

The problem with combining them sequentially is that the walkthrough says "here are the design forks, you decide" and then PR Review immediately says "this is wrong, fix it." In about 4 of 9 test cases the neutrality reads as theater once the verdict follows. They serve different audiences at different moments in the review lifecycle.

I considered three options: (1) standalone peer, different audiences, already interoperable via diff-state.json; (2) subagent of code-review-full, technically works but creates voice-whiplash in security/governance PRs; (3) dual registration, subagent AND standalone, users choose invocation path.

Based on @katriendg's review feedback, the branch now reflects option 1: standalone peer. The subagent registration and coding-standards collection entry have been removed. The agent stays in the hve-core collection only, marked maturity: experimental, with documentation at docs/agents/pr-walkthrough/. Pipeline interop via diff-state.json and the shared pr-reference skill remains wired for future orchestration if the team decides to revisit.

bindsi

Approved: the PR Walkthrough agent registration and generated artifacts are consistent with existing agent patterns. No actionable issues found.

katriendg

Thanks @dfinson, this is a very interesting addition to the platform, and I especially appreciate the fact you have been experimenting with this before submitting the contribution.
I look forward to fully testing it once it's been added to the repo and merged in.
Experimental is the right maturity fit for user testing.

I've left a few comments inline, where I think first there is a confusion about Code Review and PR Review - this new agent goes more along with PR Review, not Code Review which is an orchestrator for coding/programming, not a more generic PR reviewer. Let's keep this new one outside of Coding standards.

One important addition needed to merge, with this new agent we must document, add it to CUSTOM-AGENTS.md, and more importantly document its own dedicated page under ./docs/agents/README.md docs

jkim323 · 2026-06-16T04:48:32Z

Please ensure you ran these checks:

AI Artifact Contributions

Used /prompt-analyze to review contribution
Addressed all feedback from prompt-builder review
Verified contribution follows common standards and type-specific requirements

bindsi

Approved: the PR Walkthrough agent, documentation, and collection/plugin wiring are now consistent. I did not find actionable packaging or documentation issues in the current head.

dfinson · 2026-06-16T13:11:25Z

AI Artifact Contribution Checks

Re: @jkim323's checklist:

✅ Used /prompt-analyze\ to review contribution — Ran a full prompt-builder quality audit. Found 4 critical + 2 major issues.
✅ Addressed all feedback from \prompt-builder\ review — Fixed across 4 commits, each A/B tested against 5 PRs with independent scoring subagents:
- Fix Add development tools configuration files #3: Coherence tension (opinions vs editorializing clarification) — avg delta -0.10
- Fix Add repository foundation and documentation files #2: Bolded-prefix bullet removal — avg delta +0.37
- Fix Repository Foundation & Documentation #1: ALL CAPS removal (replaced with bold/italic) — avg delta +0.20
- Fix Development tools configuration files missing #4: Token budget (extracted voice to \walkthrough-voice.instructions.md) — validated by 10-PR A/B experiment (8.44 vs 8.43, tied)
✅ Verified contribution follows common standards — Agent description under 120 chars, asterisk bullets, standard section naming (## Required Steps), no run-together paragraphs.

Remaining ALL CAPS in the file are legitimate: \BAD:/\GOOD:\ (example labels), \WEAKEN/\KILL/\COUNTER\ (enum action labels), and git/code placeholders (\MERGE_BASE, \HEAD, \AUTHOR).

- Add pr-walkthrough.agent.md for narrative-driven PR review orientation - Register in hve-core and hve-core-all collections - Add generated plugin symlinks Relates to microsoft#1946 🚀 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nd orchestrated path - Replace angle-bracket placeholders with shell-safe variable patterns - Clarify orchestrated mode still performs Step 1 hunk analysis - Use command substitution for merge-base in fallback diff commands Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- add value proposition sentence establishing the agent's core purpose - add BAD/GOOD editorial example demonstrating tradeoff presentation - add stage-aware calibration for scaffold vs production code - add COUNTER as 4th self-verification verdict for author-pushback prediction - add quantity/softening refusal items - add 'What Done Looks Like' 11-item completion checklist 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…d mark experimental - Add PR Walkthrough to code-review-full agents list - Add maturity: experimental to hve-core and hve-core-all collection entries - Register pr-walkthrough in coding-standards collection as subagent dependency - Regenerate plugins 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove subagent registration from code-review-full (not a code review tool) - Remove from coding-standards collection (standalone agent, not a subagent) - Fix description to remove subagent-of-PR-Review claim - Fix shell placeholder (use literal MERGE_BASE variable, not prompt input syntax) - Add voice convention note explaining why output voice differs from repo style - Add documentation page in docs/agents/pr-walkthrough/ - Add entry to CUSTOM-AGENTS.md - Regenerate plugins and extension manifests 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Replace dash bullets with asterisk bullets per repo convention - Rename Pipeline section to Required Steps per protocol patterns - Trim description to under 120 chars - Fix run-together paragraphs (missing line breaks between bold items) - Add sentence breaks between concatenated prose blocks 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Reword 'has opinions' and 'never editorialize' to remove contradiction - Soften one ALL CAPS instance to bold emphasis - A/B tested across 5 PRs: avg delta -0.10 (within noise) 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ules - Convert '* **Title.** Description' to plain '* Description' format - A/B tested across 5 PRs: avg delta +0.37 (net improvement) 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Convert THIS IS A BLOG POST to bold emphasis - Replace NOT/YOUR/ISOLATING/PRESENTING with lowercase or italic - Keep BAD/GOOD (example labels) and WEAKEN/KILL/COUNTER (enum labels) - A/B tested across 5 PRs: avg delta +0.20 (no regression) 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tions.md - Move ~70 lines of voice/wit/rhetoric guidance from agent to instructions file - Agent file references extracted instructions via auto-attach (applyTo pattern) - Register new instructions in hve-core and hve-core-all collections - Regenerate plugins and extension manifests - A/B experiment (10 PRs) confirmed no quality regression from extraction 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The hve-core-all regenerator dropped maturity: experimental from sssc-planner.instructions.md and supply-chain-security skill entries during rebase conflict resolution. 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jkim323 · 2026-06-16T17:21:01Z

AI Artifact Contribution Checks

Re: @jkim323's checklist:

✅ Used /prompt-analyze\ to review contribution — Ran a full prompt-builder quality audit. Found 4 critical + 2 major issues.

✅ Addressed all feedback from \prompt-builder\ review — Fixed across 4 commits, each A/B tested against 5 PRs with independent scoring subagents:

Fix Add development tools configuration files #3: Coherence tension (opinions vs editorializing clarification) — avg delta -0.10

Fix Add repository foundation and documentation files #2: Bolded-prefix bullet removal — avg delta +0.37

Fix Repository Foundation & Documentation #1: ALL CAPS removal (replaced with bold/italic) — avg delta +0.20

Fix Development tools configuration files missing #4: Token budget (extracted voice to \walkthrough-voice.instructions.md) — validated by 10-PR A/B experiment (8.44 vs 8.43, tied)

✅ Verified contribution follows common standards — Agent description under 120 chars, asterisk bullets, standard section naming (## Required Steps), no run-together paragraphs.

Remaining ALL CAPS in the file are legitimate: \BAD:/\GOOD:\ (example labels), \WEAKEN/\KILL/\COUNTER\ (enum action labels), and git/code placeholders (\MERGE_BASE, \HEAD, \AUTHOR).

awesome thank you for doing that!!

katriendg

Thanks for the changes, standalone agent next to PR review (HVE core) collection makes sense to me.
I've left a few comments, as we still have the orchestration mode that was working with Code Review agent, and that now should be removed. So it touches on the agent itself and the doc.

katriendg · 2026-06-17T07:38:13Z

Thanks for the feedback @jkim323. I tested this and want to share some context on this exact design tension I've been wrestling with.

I ran the walkthrough against 9 merged PRs and fed the output to a model acting as pr-review for focus-zone extraction. 9/9 high-confidence extractions, so the subagent model works mechanically. But there's a philosophical problem. This agent exists in its current form (i.e. 385 lines of attitude - professional, focused attitude, but still designed as a strong personality) because without the opinionated voice and strong point of view, the model immediately regresses to gluing English prose between diff hunks, using the hunks themselves as narrative scaffolding. IMO that kind of output isn't worth very much because it doesn't capture human attention, doesn't abstract by ideas, and doesn't structure around decisions. That style makes more sense if the agent is passing judgement (which pr review already does), less so if it needs to act as a lens for human attention. The personality itself seems to be what forces architectural thinking rather than line-by-line summarization.

Which means this thing embodies a fundamentally different philosophy than classical AI code review: PR Review scans the diff and formulates its own judgments, while this agent explicitly bans itself from judgment. Its job is to focus the human on what needs their judgment, not to replace it. It's for the person staring down a 45-file diff who needs orientation before they can effectively form their own opinions.

The problem with combining them sequentially is that the walkthrough says "here are the design forks, you decide" and then PR Review immediately says "this is wrong, fix it." In about 4 of 9 test cases the neutrality reads as theater once the verdict follows. They serve different audiences at different moments in the review lifecycle.

I considered three options: (1) standalone peer, different audiences, already interoperable via diff-state.json; (2) subagent of code-review-full, technically works but creates voice-whiplash in security/governance PRs; (3) dual registration, subagent AND standalone, users choose invocation path.

Based on @katriendg's review feedback, the branch now reflects option 1: standalone peer. The subagent registration and coding-standards collection entry have been removed. The agent stays in the hve-core collection only, marked maturity: experimental, with documentation at docs/agents/pr-walkthrough/. Pipeline interop via diff-state.json and the shared pr-reference skill remains wired for future orchestration if the team decides to revisit.

Thanks a lot for the reflections and move to standalone. I think it's the best choice.

There are a few remaining changes to be done, the main thing to call out here is we cannot leave the pipeline interop (other than pr-reference skill (which is part of the same collection). The interactions on diff-state.json are for Code Reviews, so that would be another collection and not bundled together. Also in that case we would want to update the Code Review agent itself to gain knowledge of this new agent.

My vote is keeping it separate and standalone for an experimental testing phase. So let's not have references to the pipeline which is not fully implemented anyways.

bindsi

Approved: the PR Walkthrough agent, documentation, and collection/plugin wiring are consistent in the current head. I did not find actionable packaging or documentation issues.

…endg review - Remove diff-state.json input, orchestrated step block, and standalone/orchestrated split - Collapse into single Diff Computation section feeding Required Steps pipeline - Remove orchestrated section and pipeline integration from docs - Fix blank lines in frontmatter - Remove em-dash from walkthrough-voice.instructions.md 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The walkthrough-voice.instructions.md applyTo pattern does not fire in plugin/extension distribution contexts where the target path does not exist. Moving voice content back into the agent body makes the agent fully self-contained and portable across all distribution mechanisms. 🔧 - Generated by Copilot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dfinson · 2026-06-17T10:25:13Z

Agreed and done. Removed all diff-state.json references, orchestrated mode plumbing, and pipeline integration language. The agent is now fully standalone: computes its own diff via pr-reference skill, no cross-collection dependencies.

Also moved voice guidance back into the agent body per your inline comment about the applyTo pattern not working in plugin/extension contexts. Agent is self-contained and portable now.

All 8 inline comments addressed and resolved. CI is green.

dfinson requested a review from a team as a code owner June 14, 2026 13:04

dfinson requested a review from Copilot June 14, 2026 13:47

Copilot AI reviewed Jun 14, 2026

View reviewed changes

Comment thread .github/agents/hve-core/pr-walkthrough.agent.md

Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated

Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated

dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from abb6356 to 5f942cb Compare June 14, 2026 15:49

dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch 3 times, most recently from 248d457 to 80a7a1a Compare June 14, 2026 19:32

bindsi approved these changes Jun 15, 2026

View reviewed changes

katriendg requested changes Jun 15, 2026

View reviewed changes

dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch 2 times, most recently from 0db99c2 to dbe46e6 Compare June 15, 2026 14:39

bindsi approved these changes Jun 16, 2026

View reviewed changes

dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from 45d39a1 to bfb5283 Compare June 16, 2026 12:22

dfinson and others added 10 commits June 16, 2026 16:30

dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from ace3988 to 7bb8797 Compare June 16, 2026 13:33

dfinson requested a review from katriendg June 16, 2026 16:58

jkim323 approved these changes Jun 16, 2026

View reviewed changes

Merge branch 'main' into dfinson/feat-pr-review-narrative-walkthrough

94ebd33

katriendg reviewed Jun 17, 2026

View reviewed changes

bindsi approved these changes Jun 17, 2026

View reviewed changes

dfinson and others added 2 commits June 17, 2026 13:11

Conversation

dfinson commented Jun 14, 2026

Summary

Motivation

What's included

Modes

Key design decisions

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jkim323 commented Jun 15, 2026

Uh oh!

dfinson commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bindsi left a comment

Choose a reason for hiding this comment

Uh oh!

katriendg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkim323 commented Jun 16, 2026

AI Artifact Contributions

Uh oh!

bindsi left a comment

Choose a reason for hiding this comment

Uh oh!

dfinson commented Jun 16, 2026

AI Artifact Contribution Checks

Uh oh!

jkim323 commented Jun 16, 2026

AI Artifact Contribution Checks

Uh oh!

katriendg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

katriendg commented Jun 17, 2026

Uh oh!

bindsi left a comment

Choose a reason for hiding this comment

Uh oh!

dfinson commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-commenter commented Jun 14, 2026 •

edited

Loading

dfinson commented Jun 15, 2026 •

edited

Loading